Sponsored by ShoeLeash™ at ShoeLeash.com - "put your shoes on without bending down"

 
alt

Eugen Tarnow

 

      Is there something fishy with Johns Hopkins University’s Pima Indians Diabetes Data Set?

      Eugen G Tarnow  April 7 2017 11:43:47 AM
      By Eugen Tarnow, Ph.D.
      Avalon Business Systems, Inc.
      http://AvalonAnalytics.com

      This is a famous data set describing the incidence of diabetes in a population prone to diabetes.  It can be downloaded from here: https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes .  

      Some thirty data science publication have resulted.  But there are some strange things going on in this dataset.  

      First, the age distribution of the participants is exponential:

      Image:Is there something fishy with Johns Hopkins University’s Pima Indians Diabetes Data Set?

      Second, the body mass index does not increase with age:

      Image:Is there something fishy with Johns Hopkins University’s Pima Indians Diabetes Data Set?

      I wrote the dataset depositor but did not receive an answer.  I wrote the archivists at University of California Irvine and they decided to just leave the dataset up.

      But it seems there is something very wrong with it.  An the publications that resulted - are they therefore wrong too?

      As always, I reserve the right to be wrong.

      Comments Disabled